fraudulent transaction
EmDT: Embedding Diffusion Transformer for Tabular Data Generation in Fraud Detection
Imbalanced datasets pose a difficulty in fraud detection, as classifiers are often biased toward the majority class and perform poorly on rare fraudulent transactions. Synthetic data generation is therefore commonly used to mitigate this problem. In this work, we propose the Clustered Embedding Diffusion-Transformer (EmDT), a diffusion model designed to generate fraudulent samples. Our key innovation is to leverage UMAP clustering to identify distinct fraudulent patterns, and train a Transformer denoising network with sinusoidal positional embeddings to capture feature relationships throughout the diffusion process. Once the synthetic data has been generated, we employ a standard decision-tree-based classifier (e.g., XGBoost) for classification, as this type of model remains better suited to tabular datasets. Experiments on a credit card fraud detection dataset demonstrate that EmDT significantly improves downstream classification performance compared to existing oversampling and generative methods, while maintaining comparable privacy protection and preserving feature correlations present in the original data.
- North America > United States > Arizona > Maricopa County > Tempe (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Quantum Topological Graph Neural Networks for Detecting Complex Fraud Patterns
Doost, Mohammad, Manthouri, Mohammad
We propose a novel QTGNN framework for detecting fraudulent transactions in large-scale financial networks. By integrating quantum embedding, variational graph convolutions, and topological data analysis, QTGNN captures complex transaction dynamics and structural anomalies indicative of fraud. The methodology includes quantum data embedding with entanglement enhancement, variational quantum graph convolutions with non-linear dynamics, extraction of higher-order topological invariants, hybrid quantum-classical anomaly learning with adaptive optimization, and interpretable decision-making via topological attribution. Rigorous convergence guarantees ensure stable training on noisy intermediate-scale quantum (NISQ) devices, while stability of topological signatures provides robust fraud detection. Optimized for NISQ hardware with circuit simplifications and graph sampling, the framework scales to large transaction networks. Simulations on financial datasets, such as PaySim and Elliptic, benchmark QTGNN against classical and quantum baselines, using metrics like ROC-AUC, precision, and false positive rate. An ablation study evaluates the contributions of quantum embeddings, topological features, non-linear channels, and hybrid learning. QTGNN offers a theoretically sound, interpretable, and practical solution for financial fraud detection, bridging quantum machine learning, graph theory, and topological analysis.
- Overview (1.00)
- Research Report (0.82)
- Law Enforcement & Public Safety > Fraud (1.00)
- Information Technology > Security & Privacy (1.00)
- Banking & Finance > Trading (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Credit Card Fraud Detection
Popova, Iva, Gardi, Hamza A. A.
Iva Popova Hamza A. A. Gardi ETIT - KIT, Germany IIIT at ETIT - KIT, Germany Abstract Credit card fraud remains a significant challenge due to class imbalance and fraudsters mimicking legitimate behavior. This study evaluates five machine learning models - Logistic Regression, Random Forest, XGBoost, K - Nearest Neighbors (KNN), and Multi - Lay er Perceptron (MLP) on a real - world dataset using undersampling, SMOTE, and a hybrid approach. Our models are evaluated on the original imbalanced test set to better reflect real - world performance. Results show that the hybrid method achieves the best bala nce between recall and precision, especially improving MLP and KNN performance. I ntroduction Financial fraud is a significant issue that has been continuously increasing over the past few years due to the ever - growing volume of online transactions conduc ted with credit cards. Credit card fraud (CCF) refers to a type of fraud in which an individual other than the cardholder unlawfully conducts transactions using a card that is stolen, lost, or otherwise misused [ 1 ]. CCF has resulted in billions of dollars in losses for banks and other online payment platforms. According to the Federal Trade Commission (FTC), there were 449,076 reports of CCF in 2024, representing a 7.8% increase from the previous year [ 2 ]. Given this trend, new methods must be employed to c apture patterns and dependencies in the data.
- Europe > Germany (0.44)
- Asia > Middle East > Iraq > Wasit Governorate (0.04)
- Asia > Middle East > Iraq > Kurdistan Region > Duhok Governorate > Duhok (0.04)
- Law Enforcement & Public Safety > Fraud (1.00)
- Information Technology (1.00)
Foe for Fraud: Transferable Adversarial Attacks in Credit Card Fraud Detection
Fok, Jan Lum, Zeng, Qingwen, Chen, Shiping, Fawkes, Oscar, Chen, Huaming
Credit card fraud detection (CCFD) is a critical application of Machine Learning (ML) in the financial sector, where accurately identifying fraudulent transactions is essential for mitigating financial losses. ML models have demonstrated their effectiveness in fraud detection task, in particular with the tabular dataset. While adversarial attacks have been extensively studied in computer vision and deep learning, their impacts on the ML models, particularly those trained on CCFD tabular datasets, remains largely unexplored. These latent vulnerabilities pose significant threats to the security and stability of the financial industry, especially in high-value transactions where losses could be substantial. To address this gap, in this paper, we present a holistic framework that investigate the robustness of CCFD ML model against adversarial perturbations under different circumstances. Specifically, the gradient-based attack methods are incorporated into the tabular credit card transaction data in both black- and white-box adversarial attacks settings. Our findings confirm that tabular data is also susceptible to subtle perturbations, highlighting the need for heightened awareness among financial technology practitioners regarding ML model security and trustworthiness. Furthermore, the experiments by transferring adversarial samples from gradient-based attack method to non-gradient-based models also verify our findings. Our results demonstrate that such attacks remain effective, emphasizing the necessity of developing robust defenses for CCFD algorithms.
- Law Enforcement & Public Safety > Fraud (1.00)
- Information Technology > Security & Privacy (1.00)
- Banking & Finance (1.00)
Semi-Supervised Supply Chain Fraud Detection with Unsupervised Pre-Filtering
Moradi, Fatemeh, Tarif, Mehran, Homaei, Mohammadhossein
Detecting fraud in modern supply chains is a growing challenge, driven by the complexity of global networks and the scarcity of labeled data. Traditional detection methods often struggle with class imbalance and limited supervision, reducing their effectiveness in real-world applications. This paper proposes a novel two-phase learning framework to address these challenges. In the first phase, the Isolation Forest algorithm performs unsupervised anomaly detection to identify potential fraud cases and reduce the volume of data requiring further analysis. In the second phase, a self-training Support Vector Machine (SVM) refines the predictions using both labeled and high-confidence pseudo-labeled samples, enabling robust semi-supervised learning. The proposed method is evaluated on the DataCo Smart Supply Chain Dataset, a comprehensive real-world supply chain dataset with fraud indicators. It achieves an F1-score of 0.817 while maintaining a false positive rate below 3.0%. These results demonstrate the effectiveness and efficiency of combining unsupervised pre-filtering with semi-supervised refinement for supply chain fraud detection under real-world constraints, though we acknowledge limitations regarding concept drift and the need for comparison with deep learning approaches.
Credit Card Fraud Detection Using RoFormer Model With Relative Distance Rotating Encoding
Fraud detection is one of the most important challenges that financial systems must address. Detecting fraudulent transactions is critical for payment gateway companies like Flow Payment, which process millions of transactions monthly and require robust security measures to mitigate financial risks . Increasing transaction authorization rates while reducing fraud is essential for providing a good user experience and building a sustainable business. For this reason, discovering novel and improved methods to detect fraud requires continuous research an d investment for any company that wants to succeed in this industry. In this work, we introduce d a novel method for detecting transactional fraud by incorporating the Relative Distance Rotating Encoding ( ReDRE) in the RoFormer model . The incorporation of angle rotation using ReDRE enhances the characterization of time series data within a Transformer, leading to improved fraud detection by better capturing temporal dependencies and event relationships.
- Law Enforcement & Public Safety > Fraud (1.00)
- Information Technology > Security & Privacy (1.00)
- Banking & Finance (1.00)
A Data Balancing and Ensemble Learning Approach for Credit Card Fraud Detection
This research introduces an innovative method for identifying credit card fraud by combining the SMOTE-KMEANS technique with an ensemble machine learning model. The proposed model was benchmarked against traditional models such as logistic regression, decision trees, random forests, and support vector machines. Performance was evaluated using metrics, including accuracy, recall, and area under the curve (AUC). The results demonstrated that the proposed model achieved superior performance, with an AUC of 0.96 when combined with the SMOTE-KMEANS algorithm. This indicates a significant improvement in detecting fraudulent transactions while maintaining high precision and recall. The study also explores the application of different oversampling techniques to enhance the performance of various classifiers. The findings suggest that the proposed method is robust and effective for classification tasks on balanced datasets. Future research directions include further optimization of the SMOTE-KMEANS approach and its integration into existing fraud detection systems to enhance financial security and consumer protection.
- North America > United States > New York (0.04)
- Asia > China > Tianjin Province > Tianjin (0.04)
- Law Enforcement & Public Safety > Fraud (1.00)
- Banking & Finance (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.69)
Enforcing Cybersecurity Constraints for LLM-driven Robot Agents for Online Transactions
Shah, Shraddha Pradipbhai, Deshpande, Aditya Vilas
The integration of Large Language Models (LLMs) into autonomous robotic agents for conducting online transactions poses significant cybersecurity challenges. This study aims to enforce robust cybersecurity constraints to mitigate the risks associated with data breaches, transaction fraud, and system manipulation. The background focuses on the rise of LLM-driven robotic systems in e-commerce, finance, and service industries, alongside the vulnerabilities they introduce. A novel security architecture combining blockchain technology with multi-factor authentication (MFA) and real-time anomaly detection was implemented to safeguard transactions. Key performance metrics such as transaction integrity, response time, and breach detection accuracy were evaluated, showing improved security and system performance. The results highlight that the proposed architecture reduced fraudulent transactions by 90%, improved breach detection accuracy to 98%, and ensured secure transaction validation within a latency of 0.05 seconds. These findings emphasize the importance of cybersecurity in the deployment of LLM-driven robotic systems and suggest a framework adaptable to various online platforms.
- Information Technology > Security & Privacy (1.00)
- Government > Military > Cyberwarfare (1.00)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Financial fraud detection system based on improved random forest and gradient boosting machine (GBM)
This paper proposes a financial fraud detection system based on improved Random Forest (RF) and Gradient Boosting Machine (GBM). Specifically, the system introduces a novel model architecture called GBM-SSRF (Gradient Boosting Machine with Simplified and Strengthened Random Forest), which cleverly combines the powerful optimization capabilities of the gradient boosting machine (GBM) with improved randomization. The computational efficiency and feature extraction capabilities of the Simplified and Strengthened Random Forest (SSRF) forest significantly improve the performance of financial fraud detection. Although the traditional random forest model has good classification capabilities, it has high computational complexity when faced with large-scale data and has certain limitations in feature selection. As a commonly used ensemble learning method, the GBM model has significant advantages in optimizing performance and handling nonlinear problems. However, GBM takes a long time to train and is prone to overfitting problems when data samples are unbalanced. In response to these limitations, this paper optimizes the random forest based on the structure, reducing the computational complexity and improving the feature selection ability through the structural simplification and enhancement of the random forest. In addition, the optimized random forest is embedded into the GBM framework, and the model can maintain efficiency and stability with the help of GBM's gradient optimization capability. Experiments show that the GBM-SSRF model not only has good performance, but also has good robustness and generalization capabilities, providing an efficient and reliable solution for financial fraud detection.
- Asia > Singapore (0.05)
- North America > United States > California > Yolo County > Davis (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.71)
Impact of Sampling Techniques and Data Leakage on XGBoost Performance in Credit Card Fraud Detection
Credit card fraud detection remains a critical challenge in financial security, with machine learning models like XGBoost(eXtreme gradient boosting) emerging as powerful tools for identifying fraudulent transactions. However, the inherent class imbalance in credit card transaction datasets poses significant challenges for model performance. Although sampling techniques are commonly used to address this imbalance, their implementation sometimes precedes the train-test split, potentially introducing data leakage. This study presents a comparative analysis of XGBoost's performance in credit card fraud detection under three scenarios: Firstly without any imbalance handling techniques, secondly with sampling techniques applied only to the training set after the train-test split, and third with sampling techniques applied before the train-test split. We utilized a dataset from Kaggle of 284,807 credit card transactions, containing 0.172\% fraudulent cases, to evaluate these approaches. Our findings show that although sampling strategies enhance model performance, the reliability of results is greatly impacted by when they are applied. Due to a data leakage issue that frequently occurs in machine learning models during the sampling phase, XGBoost models trained on data where sampling was applied prior to the train-test split may have displayed artificially inflated performance metrics. Surprisingly, models trained with sampling techniques applied solely to the training set demonstrated significantly lower results than those with pre-split sampling, all the while preserving the integrity of the evaluation process.
- Law Enforcement & Public Safety > Fraud (1.00)
- Banking & Finance (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)